Automating rule generation for grammar checkers

نویسنده

  • Marcin Milkowski
چکیده

In this paper, I describe several approaches to automatic or semiautomatic development of symbolic rules for grammar checkers from the information contained in corpora. The rules obtained this way are an important addition to manually-created rules that seem to dominate in rulebased checkers. However, the manual process of creation of rules is costly, time-consuming and error-prone. It seems therefore advisable to use machine-learning algorithms to create the rules automatically or semiautomatically. The results obtained seem to corroborate our initial hypothesis that symbolic machine learning algorithms can be useful for acquiring new rules for grammar checking. It turns out, however, that for practical uses, error corpora cannot be the sole source of information used in grammar checking. We suggest therefore that only by using different approaches, grammar-checkers, or more generally, computer-aided proofreading tools, will be able to cover most frequent and severe mistakes and avoid false alarms that seem to distract users. In what follows, I will show how Transformation-Based Learning (TBL) algorithms may be used to acquire rules. Before doing that, I will discuss the pros and cons of three approaches to creating rules and show the need to make use of them all in a successful grammar-checking tool. The results obtained seem to suggest that the machine-learning approach is actually fruitful, and I will point to some future work related to the reported research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammar Checkers for Natural Languages: a Review

Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called ‘Computational Linguistics’ which focuses on processing of natural languages on computational devices. A natural language consists of many sentences which are meaningful linguistic units involving one or more words ...

متن کامل

A Survey of Grammar Checkers for Natural Languages

Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called ‘Computational Linguistic’ which focuses on processing of natural languages on computational devices. A natural language consists of a large number of sentences which are linguistic units involving one or more words...

متن کامل

Automating the Generation of a Wide-coverage LFG for French using a MetaGrammar

In this paper, we explain how the notion of MetaGrammar, which has successfully been used for generating wide-coverage tree adjoining grammars (TAGs) for various languages such as French (Abeillé et al. (1999)) and German (Gerdes (2002)), may be used to generate a wide-coverage Lexical Functional Grammar (LFG) for French. We first introduce the notion of MetaGrammar and present the tools we use...

متن کامل

Evaluating Two Web-based Grammar Checkers-Microsoft ESL Assistant and NTNU Statistical Grammar Checker

Many ESL students need to improve writing skills to pass various language tests; thus, writing teachers need to read many compositions and provide feedback. To help ESL teachers reduce their teaching load and to give students faster feedback, various English grammar checkers have been developed. Few of these PC-based grammar checkers, however, are widely available to ESL learners. As the Intern...

متن کامل

Linguistics behind the mirror

A natural language is usually modelled as a subset of the set T ∗ of strings (over some set T of terminals) generated by some grammar G. Thus, T ∗ is divided into two disjoint classes: into grammatical and ungrammatical strings (any string not generated by G is considered ungrammatical). This approach brings along the following problems: – on the theoretical side, it is impossible to rule out c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1211.6887  شماره 

صفحات  -

تاریخ انتشار 2011